Suprasegmental duration modelling with elastic constraints in automatic speech recognition

نویسندگان

  • Laurence Molloy
  • Stephen Isard
چکیده

In this paper a method of integrating a model of suprasegmental duration with a HMM-based recogniser at the post-processing level is presented. The N-Best utterance output is rescored using a suitable linear combination of acoustic log-likelihood (provided by a set of tied-state triphone HMMs) and duration log-likelihood (provided by a set of durational models). The durational model used in the post-processing imposes syllable-level elastic constraints on the durational behaviour of speech segments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adding Word Duration Information to Bigram Language Models

Suprasegmental information, while generally thought to play an important role in speech recognition by human listeners, has shown little promise in previous attempts to integrate into ASR systems. This paper outlines an approach that will successfully exploit suprasegmental information by modeling duration within the context of N-gram language modeling. Results show that up to half of the varia...

متن کامل

Automatic Segmentation of Continuous Speech on Word and Phrase Level based on Suprasegmental Features

This article investigates whether it is possible to segment continuous speech on word and phrasal level by examination of suprasegmental parameters, in case of bound stress languages like Hungarian and Finnish. The final aim is to increase the robustness of speech recognition on language modelling level by the detection of word and phrase boundaries and so we can significantly decrease the sear...

متن کامل

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract   Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

Automatic utterance type detection using suprasegmental features

The goal of the work presented here is to automatically predict the type of an utterance in spoken dialogue by using automatically extracted suprasegmental information. For this task, we present and compare three stochastic algorithms: hidden Markov models, artificial neural nets, and classification and regression trees. These models are easily trainable, reasonably robust and fit into the prob...

متن کامل

Context-dependent word duration modelling for robust speech recognition

Conventional hidden Markov models (HMMs) have weak duration constraints. This may cause the decoder to produce word matches with unrealistic durations in noisy situations. This paper describes techniques for modelling context-dependent word duration cues and incorporating them directly in a multi-stack decoding algorithm. The proposed model is capable of penalising duration constraints of a wor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998